Mapping Collocational Properties into Machine Learning Features
نویسندگان
چکیده
This paper investigates interactions between collocational properties and methods for organizing them into features for machine learning. In experiments performing an event categorization task, Wiebe et al. (1997a) found that different organizations are best for different properties. This paper presents a statistical analysis of the results across different machine learning algorithms. In the experiments, the relationship between property and organization was strikingly consistent across algorithms. This prompted further analysis of this relationship, and an investigation of criteria for recognizing beneficial ways to include collocational properties in machine learning experiments. While many types of collocational properties and methods of organizing them into features have been used in NLP, systematic investigations of their interaction are rare. 1 I n t r o d u c t i o n Properties can be mapped to features in a machine learning algorithm in different ways, potentially yielding different results (see, e.g., Hu and Kibler 1996 and Pagallo and Haussler 1990). This paper investigates interactions between collocational properties and methods for organizing them into features. Collocations, conceived broadly as words meeting certain constraints that are correlated with the targeted classification, are used in a wide range of NLP applications, from word-sense disambiguation to discourse processing. They must be selected and represented in s o m e way. Thus, this work is widely applicable to experimental design in NLP. In experiments performing an event categorization task, Wiebe et al. (1997a) co-varied four types of organization and three types of collocational property. They found that different organizations are best for different properties, and that the best results are obtained with the most constrained properties and an organization that is not common in NLP (but see Goldberg 1995 and Cohen 1996). However, they experimented with only one machine learning algorithm, and did not offer any insight into the results. This paper presents a statistical analysis of the results across different machine learning algorithms. In the experiments, the relationship between property and organization is strikingly consistent across algorithms. This prompted further analysis of this relationship, and a study of criteria for recognizing beneficial ways to include collocations in machine learning experiments. While many types of collocational properties and methods for representing them as features have been used in NLP, systematic investigations of their interaction are rare. The paper is organized as follows. The event categorization task is described in second 2. The collocational properties, methods for selecting collocations, and methods for organizing them into features are presented in sections 3, 4.1, and 4.2, respectively. The machine learning algorithms are identified in section 5, and the results and statistical analysis of them are presented in section 6. The study of interaction between property and organization is presented in section 7.
منابع مشابه
Collocational Properties in Probabilistic Classifiers for Discourse Categorization
Properties can be mapped to features in a machine learning algorithm in different ways, potentially yielding different results. In previous work, we experimented with various approaches to organizing collocational properties into features in a probabilistic classifier. It was found that one type of organization in particular, which is rarely used in NLP, allows one to take advantage of infreque...
متن کاملCollocational Properties in Probabilistic Classi ers for Discourse Categorization
Properties can be mapped to features in a machine learning algorithm in diierent ways, potentially yielding diierent results. In previous work, we experimented with various approaches to organizing colloca-tional properties into features in a probabilistic classi-er. It was found that one type of organization in particular , which is rarely used in NLP, allows one to take advantage of infrequen...
متن کاملBody Mass Index Classification based on Facial Features using Machine Learning Algorithms for utilizing in Telemedicine
Background and Objectives: Due to the impact of controlling BMI on life, BMI classification based on facial features can be used for developing Telemedicine systems and eliminating the limitations of measuring tools, especially for paralyzed people. So that physicians can help people online during the Covid-19 pandemic. Method: In this study, new features and some previous work features were e...
متن کاملDust source mapping using satellite imagery and machine learning models
Predicting dust sources area and determining the affecting factors is necessary in order to prioritize management and practice deal with desertification due to wind erosion in arid areas. Therefore, this study aimed to evaluate the application of three machine learning models (including generalized linear model, artificial neural network, random forest) to predict the vulnerability of dust cent...
متن کاملImproving the Performance of Machine Learning Algorithms for Heart Disease Diagnosis by Optimizing Data and Features
Heart is one of the most important members of the body, and heart disease is the major cause of death in the world and Iran. This is why the early/on time diagnosis is one of the significant basics for preventing and reducing deaths of this disease. So far, many studies have been done on heart disease with the aim of prediction, diagnosis, and treatment. However, most of them have been mostly f...
متن کامل